Overview
Brought to you by YData
Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 50000 |
| Missing cells | 50140 |
| Missing cells (%) | 5.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 33.7 MiB |
| Average record size in memory | 707.3 B |
Variable types
| Text | 4 |
|---|---|
| Categorical | 5 |
| Numeric | 11 |
Aromaticity is highly overall correlated with Oxidized_coefficient and 1 other fields | High correlation |
Function_Prediction_source is highly overall correlated with Protein_source | High correlation |
Function_prediction_source is highly overall correlated with Phage_source and 1 other fields | High correlation |
Molecular_weight is highly overall correlated with Oxidized_coefficient and 1 other fields | High correlation |
Oxidized_coefficient is highly overall correlated with Aromaticity and 2 other fields | High correlation |
Phage_source is highly overall correlated with Function_prediction_source and 1 other fields | High correlation |
Protein_source is highly overall correlated with Function_Prediction_source and 2 other fields | High correlation |
Reduced_coefficient is highly overall correlated with Aromaticity and 2 other fields | High correlation |
Start is highly overall correlated with Stop | High correlation |
Stop is highly overall correlated with Start | High correlation |
Protein_source is highly imbalanced (93.7%) | Imbalance |
Function_prediction_source has 22808 (45.6%) missing values | Missing |
Function_Prediction_source has 27192 (54.4%) missing values | Missing |
Protein_ID has unique values | Unique |
Aromaticity has 8208 (16.4%) zeros | Zeros |
Instability_index has 763 (1.5%) zeros | Zeros |
Helix_fraction has 2122 (4.2%) zeros | Zeros |
Turn_fraction has 2961 (5.9%) zeros | Zeros |
Sheet_fraction has 2307 (4.6%) zeros | Zeros |
Reduced_coefficient has 13681 (27.4%) zeros | Zeros |
Oxidized_coefficient has 13180 (26.4%) zeros | Zeros |
Reproduction
| Analysis started | 2025-07-21 08:30:44.767931 |
|---|---|
| Analysis finished | 2025-07-21 08:30:58.809486 |
| Duration | 14.04 seconds |
| Software version | ydata-profiling v0.0.dev0 |
| Download configuration | config.json |
Variables
Phage_ID
Text
| Distinct | 47854 |
|---|---|
| Distinct (%) | 95.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.4 MiB |
Length
| Max length | 87 |
|---|---|
| Median length | 84 |
| Mean length | 34.57618 |
| Min length | 5 |
Unique
| Unique | 45815 ? |
|---|---|
| Unique (%) | 91.6% |
Sample
| 1st row | NC_013650.1 |
|---|---|
| 2nd row | NC_021349.1 |
| 3rd row | NC_010392.1 |
| 4th row | NC_021071.1 |
| 5th row | NC_019510.1 |
| Value | Count | Frequency (%) |
| mgv-genome-0379339 | 5 | < 0.1% |
| mgv-genome-0378063 | 4 | < 0.1% |
| kj019095.1 | 4 | < 0.1% |
| mgv-genome-0341507 | 4 | < 0.1% |
| mgv-genome-0378082 | 4 | < 0.1% |
| imgvr_uvig_3300045988_112928|3300045988|ga0495776_101926 | 4 | < 0.1% |
| mgv-genome-0376837 | 4 | < 0.1% |
| imgvr_uvig_3300045988_068024|3300045988|ga0495776_021873 | 3 | < 0.1% |
| uvig_417742 | 3 | < 0.1% |
| imgvr_uvig_2846161197_000008|2846161197|2846161252|18420-67883 | 3 | < 0.1% |
| Other values (47844) | 49962 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 189853 | 11.0% |
| _ | 137979 | 8.0% |
| 3 | 107088 | 6.2% |
| 1 | 90940 | 5.3% |
| 2 | 84114 | 4.9% |
| 8 | 82186 | 4.8% |
| 5 | 79728 | 4.6% |
| 4 | 79075 | 4.6% |
| 9 | 73268 | 4.2% |
| 7 | 70447 | 4.1% |
| Other values (56) | 734131 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1728809 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 189853 | 11.0% |
| _ | 137979 | 8.0% |
| 3 | 107088 | 6.2% |
| 1 | 90940 | 5.3% |
| 2 | 84114 | 4.9% |
| 8 | 82186 | 4.8% |
| 5 | 79728 | 4.6% |
| 4 | 79075 | 4.6% |
| 9 | 73268 | 4.2% |
| 7 | 70447 | 4.1% |
| Other values (56) | 734131 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1728809 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 189853 | 11.0% |
| _ | 137979 | 8.0% |
| 3 | 107088 | 6.2% |
| 1 | 90940 | 5.3% |
| 2 | 84114 | 4.9% |
| 8 | 82186 | 4.8% |
| 5 | 79728 | 4.6% |
| 4 | 79075 | 4.6% |
| 9 | 73268 | 4.2% |
| 7 | 70447 | 4.1% |
| Other values (56) | 734131 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1728809 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 189853 | 11.0% |
| _ | 137979 | 8.0% |
| 3 | 107088 | 6.2% |
| 1 | 90940 | 5.3% |
| 2 | 84114 | 4.9% |
| 8 | 82186 | 4.8% |
| 5 | 79728 | 4.6% |
| 4 | 79075 | 4.6% |
| 9 | 73268 | 4.2% |
| 7 | 70447 | 4.1% |
| Other values (56) | 734131 |
Protein_source
Categorical
High correlation  Imbalance 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.1 MiB |
| prodigal | |
|---|---|
| RefSeq | 586 |
| Genbank | 256 |
| DDBJ | 19 |
| EMBL | 15 |
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 7.96872 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | RefSeq |
|---|---|
| 2nd row | RefSeq |
| 3rd row | RefSeq |
| 4th row | RefSeq |
| 5th row | RefSeq |
Common Values
| Value | Count | Frequency (%) |
| prodigal | 49124 | |
| RefSeq | 586 | 1.2% |
| Genbank | 256 | 0.5% |
| DDBJ | 19 | < 0.1% |
| EMBL | 15 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| prodigal | 49124 | |
| refseq | 586 | 1.2% |
| genbank | 256 | 0.5% |
| ddbj | 19 | < 0.1% |
| embl | 15 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 49380 | |
| r | 49124 | |
| p | 49124 | |
| o | 49124 | |
| d | 49124 | |
| i | 49124 | |
| g | 49124 | |
| l | 49124 | |
| e | 1428 | 0.4% |
| R | 586 | 0.1% |
| Other values (13) | 3174 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 398436 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 49380 | |
| r | 49124 | |
| p | 49124 | |
| o | 49124 | |
| d | 49124 | |
| i | 49124 | |
| g | 49124 | |
| l | 49124 | |
| e | 1428 | 0.4% |
| R | 586 | 0.1% |
| Other values (13) | 3174 | 0.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 398436 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 49380 | |
| r | 49124 | |
| p | 49124 | |
| o | 49124 | |
| d | 49124 | |
| i | 49124 | |
| g | 49124 | |
| l | 49124 | |
| e | 1428 | 0.4% |
| R | 586 | 0.1% |
| Other values (13) | 3174 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 398436 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 49380 | |
| r | 49124 | |
| p | 49124 | |
| o | 49124 | |
| d | 49124 | |
| i | 49124 | |
| g | 49124 | |
| l | 49124 | |
| e | 1428 | 0.4% |
| R | 586 | 0.1% |
| Other values (13) | 3174 | 0.8% |
Function_prediction_source
Categorical
High correlation  Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 22808 |
| Missing (%) | 45.6% |
| Memory size | 3.0 MiB |
| eggNOG-mapper | |
|---|---|
| Iterative search | |
| - | |
| RefSeq | 586 |
| Genbank | 256 |
| Other values (2) | 34 |
Length
| Max length | 16 |
|---|---|
| Median length | 13 |
| Mean length | 11.525118 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | RefSeq |
|---|---|
| 2nd row | RefSeq |
| 3rd row | RefSeq |
| 4th row | RefSeq |
| 5th row | RefSeq |
Common Values
| Value | Count | Frequency (%) |
| eggNOG-mapper | 10873 | |
| Iterative search | 10077 | |
| - | 5366 | 10.7% |
| RefSeq | 586 | 1.2% |
| Genbank | 256 | 0.5% |
| DDBJ | 19 | < 0.1% |
| EMBL | 15 | < 0.1% |
| (Missing) | 22808 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| eggnog-mapper | 10873 | |
| iterative | 10077 | |
| search | 10077 | |
| 5366 | ||
| refseq | 586 | 1.6% |
| genbank | 256 | 0.7% |
| ddbj | 19 | 0.1% |
| embl | 15 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 53405 | |
| a | 31283 | 10.0% |
| r | 31027 | 9.9% |
| g | 21746 | 6.9% |
| p | 21746 | 6.9% |
| t | 20154 | 6.4% |
| - | 16239 | 5.2% |
| G | 11129 | 3.6% |
| m | 10873 | 3.5% |
| N | 10873 | 3.5% |
| Other values (21) | 84916 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 313391 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 53405 | |
| a | 31283 | 10.0% |
| r | 31027 | 9.9% |
| g | 21746 | 6.9% |
| p | 21746 | 6.9% |
| t | 20154 | 6.4% |
| - | 16239 | 5.2% |
| G | 11129 | 3.6% |
| m | 10873 | 3.5% |
| N | 10873 | 3.5% |
| Other values (21) | 84916 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 313391 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 53405 | |
| a | 31283 | 10.0% |
| r | 31027 | 9.9% |
| g | 21746 | 6.9% |
| p | 21746 | 6.9% |
| t | 20154 | 6.4% |
| - | 16239 | 5.2% |
| G | 11129 | 3.6% |
| m | 10873 | 3.5% |
| N | 10873 | 3.5% |
| Other values (21) | 84916 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 313391 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 53405 | |
| a | 31283 | 10.0% |
| r | 31027 | 9.9% |
| g | 21746 | 6.9% |
| p | 21746 | 6.9% |
| t | 20154 | 6.4% |
| - | 16239 | 5.2% |
| G | 11129 | 3.6% |
| m | 10873 | 3.5% |
| N | 10873 | 3.5% |
| Other values (21) | 84916 |
Start
Real number (ℝ)
High correlation 
| Distinct | 34072 |
|---|---|
| Distinct (%) | 68.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28967.163 |
| Minimum | 1 |
|---|---|
| Maximum | 428743 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1375.95 |
| Q1 | 8914.75 |
| median | 20912.5 |
| Q3 | 37411.5 |
| 95-th percentile | 87802.15 |
| Maximum | 428743 |
| Range | 428742 |
| Interquartile range (IQR) | 28496.75 |
Descriptive statistics
| Standard deviation | 30857.067 |
|---|---|
| Coefficient of variation (CV) | 1.065243 |
| Kurtosis | 14.251476 |
| Mean | 28967.163 |
| Median Absolute Deviation (MAD) | 13497.5 |
| Skewness | 2.882722 |
| Sum | 1.4483582 × 109 |
| Variance | 9.521586 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 214 | 0.4% |
| 3 | 182 | 0.4% |
| 2 | 168 | 0.3% |
| 50 | 26 | 0.1% |
| 1041 | 8 | < 0.1% |
| 550 | 8 | < 0.1% |
| 19703 | 8 | < 0.1% |
| 1651 | 7 | < 0.1% |
| 14073 | 7 | < 0.1% |
| 40 | 7 | < 0.1% |
| Other values (34062) | 49365 |
| Value | Count | Frequency (%) |
| 1 | 214 | |
| 2 | 168 | |
| 3 | 182 | |
| 4 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 6 | 3 | < 0.1% |
| 7 | 2 | < 0.1% |
| 8 | 2 | < 0.1% |
| 9 | 1 | < 0.1% |
| 13 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 428743 | 1 | |
| 424369 | 1 | |
| 413772 | 1 | |
| 408912 | 1 | |
| 361484 | 1 | |
| 356037 | 1 | |
| 350687 | 1 | |
| 339671 | 1 | |
| 336590 | 1 | |
| 330709 | 1 |
Stop
Real number (ℝ)
High correlation 
| Distinct | 34620 |
|---|---|
| Distinct (%) | 69.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29650.514 |
| Minimum | 63 |
|---|---|
| Maximum | 428895 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 63 |
|---|---|
| 5-th percentile | 1978 |
| Q1 | 9648 |
| median | 21621 |
| Q3 | 38088.5 |
| 95-th percentile | 88481.25 |
| Maximum | 428895 |
| Range | 428832 |
| Interquartile range (IQR) | 28440.5 |
Descriptive statistics
| Standard deviation | 30856.762 |
|---|---|
| Coefficient of variation (CV) | 1.0406822 |
| Kurtosis | 14.243434 |
| Mean | 29650.514 |
| Median Absolute Deviation (MAD) | 13469 |
| Skewness | 2.8816907 |
| Sum | 1.4825257 × 109 |
| Variance | 9.5213978 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5349 | 8 | < 0.1% |
| 8960 | 7 | < 0.1% |
| 3317 | 7 | < 0.1% |
| 9198 | 6 | < 0.1% |
| 11271 | 6 | < 0.1% |
| 8329 | 6 | < 0.1% |
| 2648 | 6 | < 0.1% |
| 1780 | 6 | < 0.1% |
| 717 | 6 | < 0.1% |
| 6446 | 6 | < 0.1% |
| Other values (34610) | 49936 |
| Value | Count | Frequency (%) |
| 63 | 1 | < 0.1% |
| 66 | 1 | < 0.1% |
| 67 | 2 | |
| 68 | 1 | < 0.1% |
| 71 | 2 | |
| 73 | 3 | |
| 75 | 2 | |
| 76 | 1 | < 0.1% |
| 79 | 2 | |
| 80 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 428895 | 1 | |
| 424716 | 1 | |
| 414050 | 1 | |
| 409217 | 1 | |
| 361909 | 1 | |
| 357215 | 1 | |
| 351340 | 1 | |
| 340450 | 1 | |
| 336715 | 1 | |
| 331221 | 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | + |
|---|---|
| 2nd row | + |
| 3rd row | - |
| 4th row | - |
| 5th row | + |
Common Values
| Value | Count | Frequency (%) |
| + | 25009 | |
| - | 24991 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 50000 |
Most occurring characters
| Value | Count | Frequency (%) |
| + | 25009 | |
| - | 24991 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 50000 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| + | 25009 | |
| - | 24991 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 50000 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| + | 25009 | |
| - | 24991 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 50000 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| + | 25009 | |
| - | 24991 |
Protein_ID
Text
Unique 
| Distinct | 50000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.5 MiB |
Length
| Max length | 89 |
|---|---|
| Median length | 85 |
| Mean length | 37.44138 |
| Min length | 8 |
Unique
| Unique | 50000 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | YP_003347791.1 |
|---|---|
| 2nd row | YP_008061629.1 |
| 3rd row | YP_001700595.1 |
| 4th row | YP_007877716.1 |
| 5th row | YP_007005441.1 |
| Value | Count | Frequency (%) |
| yp_004323762.1 | 1 | < 0.1% |
| biochar_1198_26 | 1 | < 0.1% |
| yp_003347791.1 | 1 | < 0.1% |
| yp_008061629.1 | 1 | < 0.1% |
| yp_001700595.1 | 1 | < 0.1% |
| yp_007877716.1 | 1 | < 0.1% |
| yp_007005441.1 | 1 | < 0.1% |
| yp_007675378.1 | 1 | < 0.1% |
| yp_007006505.1 | 1 | < 0.1% |
| np_899350.1 | 1 | < 0.1% |
| Other values (49990) | 49990 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 195579 | 10.4% |
| _ | 187103 | 10.0% |
| 3 | 118922 | 6.4% |
| 1 | 108508 | 5.8% |
| 2 | 97405 | 5.2% |
| 4 | 89323 | 4.8% |
| 5 | 88523 | 4.7% |
| 8 | 88245 | 4.7% |
| 9 | 79243 | 4.2% |
| 7 | 77096 | 4.1% |
| Other values (56) | 742122 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1872069 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 195579 | 10.4% |
| _ | 187103 | 10.0% |
| 3 | 118922 | 6.4% |
| 1 | 108508 | 5.8% |
| 2 | 97405 | 5.2% |
| 4 | 89323 | 4.8% |
| 5 | 88523 | 4.7% |
| 8 | 88245 | 4.7% |
| 9 | 79243 | 4.2% |
| 7 | 77096 | 4.1% |
| Other values (56) | 742122 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1872069 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 195579 | 10.4% |
| _ | 187103 | 10.0% |
| 3 | 118922 | 6.4% |
| 1 | 108508 | 5.8% |
| 2 | 97405 | 5.2% |
| 4 | 89323 | 4.8% |
| 5 | 88523 | 4.7% |
| 8 | 88245 | 4.7% |
| 9 | 79243 | 4.2% |
| 7 | 77096 | 4.1% |
| Other values (56) | 742122 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1872069 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 195579 | 10.4% |
| _ | 187103 | 10.0% |
| 3 | 118922 | 6.4% |
| 1 | 108508 | 5.8% |
| 2 | 97405 | 5.2% |
| 4 | 89323 | 4.8% |
| 5 | 88523 | 4.7% |
| 8 | 88245 | 4.7% |
| 9 | 79243 | 4.2% |
| 7 | 77096 | 4.1% |
| Other values (56) | 742122 |
Product
Text
| Distinct | 3990 |
|---|---|
| Distinct (%) | 8.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 MiB |
Length
| Max length | 902 |
|---|---|
| Median length | 761 |
| Mean length | 26.1706 |
| Min length | 2 |
Unique
| Unique | 1764 ? |
|---|---|
| Unique (%) | 3.5% |
Sample
| 1st row | hypothetical protein |
|---|---|
| 2nd row | HNH endonuclease |
| 3rd row | bacteriophage tail tip assembly protein%3B Lambda gpK homolog |
| 4th row | hypothetical protein |
| 5th row | DNA primase/helicase |
| Value | Count | Frequency (%) |
| unknown | 19675 | 11.6% |
| protein | 12785 | 7.6% |
| of | 4714 | 2.8% |
| hypothetical | 4343 | 2.6% |
| the | 4086 | 2.4% |
| domain | 3814 | 2.3% |
| phage | 3180 | 1.9% |
| family | 2963 | 1.8% |
| dna | 2889 | 1.7% |
| to | 2099 | 1.2% |
| Other values (5219) | 108485 |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 128827 | 9.8% |
| 119048 | 9.1% | |
| e | 101206 | 7.7% |
| o | 99962 | 7.6% |
| i | 88219 | 6.7% |
| t | 82679 | 6.3% |
| a | 75942 | 5.8% |
| r | 57663 | 4.4% |
| s | 49739 | 3.8% |
| l | 47978 | 3.7% |
| Other values (67) | 457267 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1308530 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| n | 128827 | 9.8% |
| 119048 | 9.1% | |
| e | 101206 | 7.7% |
| o | 99962 | 7.6% |
| i | 88219 | 6.7% |
| t | 82679 | 6.3% |
| a | 75942 | 5.8% |
| r | 57663 | 4.4% |
| s | 49739 | 3.8% |
| l | 47978 | 3.7% |
| Other values (67) | 457267 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1308530 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| n | 128827 | 9.8% |
| 119048 | 9.1% | |
| e | 101206 | 7.7% |
| o | 99962 | 7.6% |
| i | 88219 | 6.7% |
| t | 82679 | 6.3% |
| a | 75942 | 5.8% |
| r | 57663 | 4.4% |
| s | 49739 | 3.8% |
| l | 47978 | 3.7% |
| Other values (67) | 457267 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1308530 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| n | 128827 | 9.8% |
| 119048 | 9.1% | |
| e | 101206 | 7.7% |
| o | 99962 | 7.6% |
| i | 88219 | 6.7% |
| t | 82679 | 6.3% |
| a | 75942 | 5.8% |
| r | 57663 | 4.4% |
| s | 49739 | 3.8% |
| l | 47978 | 3.7% |
| Other values (67) | 457267 |
| Distinct | 64 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.2 MiB |
Length
| Max length | 45 |
|---|---|
| Median length | 9 |
| Mean length | 10.45642 |
| Min length | 6 |
Unique
| Unique | 7 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | hypothetical; |
|---|---|
| 2nd row | packaging; |
| 3rd row | assembly;infection; |
| 4th row | hypothetical; |
| 5th row | replication; |
| Value | Count | Frequency (%) |
| unsorted | 27439 | |
| hypothetical | 4341 | 8.7% |
| assembly | 3639 | 7.3% |
| replication | 2465 | 4.9% |
| infection | 1989 | 4.0% |
| packaging | 1729 | 3.5% |
| assembly;infection | 1495 | 3.0% |
| lysis | 1438 | 2.9% |
| integration | 1228 | 2.5% |
| regulation | 1117 | 2.2% |
| Other values (54) | 3120 | 6.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| ; | 53934 | |
| e | 50194 | |
| t | 49640 | |
| n | 47280 | |
| o | 43312 | 8.3% |
| s | 42415 | 8.1% |
| r | 35409 | 6.8% |
| u | 30652 | 5.9% |
| i | 29958 | 5.7% |
| d | 27623 | 5.3% |
| Other values (15) | 112404 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 522821 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| ; | 53934 | |
| e | 50194 | |
| t | 49640 | |
| n | 47280 | |
| o | 43312 | 8.3% |
| s | 42415 | 8.1% |
| r | 35409 | 6.8% |
| u | 30652 | 5.9% |
| i | 29958 | 5.7% |
| d | 27623 | 5.3% |
| Other values (15) | 112404 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 522821 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| ; | 53934 | |
| e | 50194 | |
| t | 49640 | |
| n | 47280 | |
| o | 43312 | 8.3% |
| s | 42415 | 8.1% |
| r | 35409 | 6.8% |
| u | 30652 | 5.9% |
| i | 29958 | 5.7% |
| d | 27623 | 5.3% |
| Other values (15) | 112404 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 522821 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| ; | 53934 | |
| e | 50194 | |
| t | 49640 | |
| n | 47280 | |
| o | 43312 | 8.3% |
| s | 42415 | 8.1% |
| r | 35409 | 6.8% |
| u | 30652 | 5.9% |
| i | 29958 | 5.7% |
| d | 27623 | 5.3% |
| Other values (15) | 112404 |
Molecular_weight
Real number (ℝ)
High correlation 
| Distinct | 44528 |
|---|---|
| Distinct (%) | 89.2% |
| Missing | 70 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4139.025 |
| Minimum | 75.0666 |
|---|---|
| Maximum | 8930.7077 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 75.0666 |
|---|---|
| 5-th percentile | 446.43031 |
| Q1 | 2030.3248 |
| median | 4195.8407 |
| Q3 | 6243.8347 |
| 95-th percentile | 7691.756 |
| Maximum | 8930.7077 |
| Range | 8855.6411 |
| Interquartile range (IQR) | 4213.5098 |
Descriptive statistics
| Standard deviation | 2369.4824 |
|---|---|
| Coefficient of variation (CV) | 0.57247357 |
| Kurtosis | -1.2461612 |
| Mean | 4139.025 |
| Median Absolute Deviation (MAD) | 2102.2304 |
| Skewness | -0.042482482 |
| Sum | 2.0666152 × 108 |
| Variance | 5614446.9 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 131.1729 | 110 | 0.2% |
| 146.1876 | 98 | 0.2% |
| 147.1293 | 73 | 0.1% |
| 174.201 | 58 | 0.1% |
| 105.0926 | 55 | 0.1% |
| 89.0932 | 54 | 0.1% |
| 117.1463 | 49 | 0.1% |
| 75.0666 | 47 | 0.1% |
| 146.1445 | 39 | 0.1% |
| 133.1027 | 39 | 0.1% |
| Other values (44518) | 49308 | |
| (Missing) | 70 | 0.1% |
| Value | Count | Frequency (%) |
| 75.0666 | 47 | |
| 89.0932 | 54 | |
| 105.0926 | 55 | |
| 115.1305 | 18 | < 0.1% |
| 117.1463 | 49 | |
| 119.1192 | 14 | < 0.1% |
| 121.1582 | 6 | < 0.1% |
| 131.1729 | 110 | |
| 132.1179 | 39 | 0.1% |
| 133.1027 | 39 | 0.1% |
| Value | Count | Frequency (%) |
| 8930.7077 | 1 | |
| 8815.9815 | 1 | |
| 8811.4767 | 1 | |
| 8770.8033 | 1 | |
| 8768.9863 | 1 | |
| 8743.0033 | 1 | |
| 8730.1896 | 1 | |
| 8728.032 | 1 | |
| 8718.777 | 1 | |
| 8712.844 | 1 |
Aromaticity
Real number (ℝ)
High correlation  Zeros 
| Distinct | 470 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.089570148 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 8208 |
| Zeros (%) | 16.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.042553191 |
| median | 0.083333333 |
| Q3 | 0.125 |
| 95-th percentile | 0.2 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.082446809 |
Descriptive statistics
| Standard deviation | 0.077318557 |
|---|---|
| Coefficient of variation (CV) | 0.86321792 |
| Kurtosis | 28.320356 |
| Mean | 0.089570148 |
| Median Absolute Deviation (MAD) | 0.041666667 |
| Skewness | 3.2710967 |
| Sum | 4478.5074 |
| Variance | 0.0059781592 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 8208 | 16.4% |
| 0.1428571429 | 1000 | 2.0% |
| 0.1 | 980 | 2.0% |
| 0.1111111111 | 971 | 1.9% |
| 0.125 | 934 | 1.9% |
| 0.09090909091 | 928 | 1.9% |
| 0.1666666667 | 810 | 1.6% |
| 0.07692307692 | 809 | 1.6% |
| 0.08333333333 | 789 | 1.6% |
| 0.07142857143 | 755 | 1.5% |
| Other values (460) | 33816 |
| Value | Count | Frequency (%) |
| 0 | 8208 | |
| 0.01428571429 | 14 | < 0.1% |
| 0.01449275362 | 32 | 0.1% |
| 0.01470588235 | 21 | < 0.1% |
| 0.01492537313 | 20 | < 0.1% |
| 0.01515151515 | 18 | < 0.1% |
| 0.01538461538 | 36 | 0.1% |
| 0.015625 | 17 | < 0.1% |
| 0.01587301587 | 22 | < 0.1% |
| 0.01612903226 | 17 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 64 | |
| 0.75 | 2 | < 0.1% |
| 0.6666666667 | 23 | < 0.1% |
| 0.6 | 8 | < 0.1% |
| 0.5454545455 | 1 | < 0.1% |
| 0.5 | 157 | |
| 0.4666666667 | 2 | < 0.1% |
| 0.4615384615 | 1 | < 0.1% |
| 0.4545454545 | 3 | < 0.1% |
| 0.4444444444 | 8 | < 0.1% |
Instability_index
Real number (ℝ)
Zeros 
| Distinct | 39246 |
|---|---|
| Distinct (%) | 78.6% |
| Missing | 70 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.801315 |
| Minimum | -93.533333 |
|---|---|
| Maximum | 388.53333 |
| Zeros | 763 |
| Zeros (%) | 1.5% |
| Negative | 3351 |
| Negative (%) | 6.7% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | -93.533333 |
|---|---|
| 5-th percentile | -4.2333333 |
| Q1 | 17.619216 |
| median | 33.687388 |
| Q3 | 50.628983 |
| 95-th percentile | 83.915991 |
| Maximum | 388.53333 |
| Range | 482.06667 |
| Interquartile range (IQR) | 33.009767 |
Descriptive statistics
| Standard deviation | 29.322439 |
|---|---|
| Coefficient of variation (CV) | 0.81903246 |
| Kurtosis | 5.7274053 |
| Mean | 35.801315 |
| Median Absolute Deviation (MAD) | 16.518227 |
| Skewness | 1.1393033 |
| Sum | 1787559.7 |
| Variance | 859.80545 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 763 | 1.5% |
| 5 | 434 | 0.9% |
| 6.666666667 | 330 | 0.7% |
| 7.5 | 232 | 0.5% |
| 8 | 123 | 0.2% |
| -8.98 | 96 | 0.2% |
| -13.725 | 95 | 0.2% |
| -21.63333333 | 85 | 0.2% |
| 55.65 | 84 | 0.2% |
| -37.45 | 82 | 0.2% |
| Other values (39236) | 47606 |
| Value | Count | Frequency (%) |
| -93.53333333 | 2 | < 0.1% |
| -79.55 | 2 | < 0.1% |
| -78 | 1 | < 0.1% |
| -74.83333333 | 1 | < 0.1% |
| -72.525 | 2 | < 0.1% |
| -71.73333333 | 4 | < 0.1% |
| -70.15 | 18 | |
| -69.1 | 2 | < 0.1% |
| -68.56666667 | 5 | < 0.1% |
| -67.65 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 388.5333333 | 1 | < 0.1% |
| 291.4 | 13 | |
| 275.64 | 1 | < 0.1% |
| 269.8 | 1 | < 0.1% |
| 261.8 | 3 | < 0.1% |
| 260.1333333 | 1 | < 0.1% |
| 258.3 | 1 | < 0.1% |
| 258.05 | 2 | < 0.1% |
| 248.6444444 | 1 | < 0.1% |
| 245.5333333 | 1 | < 0.1% |
Isoelectric_point
Real number (ℝ)
| Distinct | 19985 |
|---|---|
| Distinct (%) | 40.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.816935 |
| Minimum | 4.0500284 |
|---|---|
| Maximum | 11.999968 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 4.0500284 |
|---|---|
| 5-th percentile | 4.0500284 |
| Q1 | 4.6183046 |
| median | 6.0690062 |
| Q3 | 9.1618761 |
| 95-th percentile | 10.6676 |
| Maximum | 11.999968 |
| Range | 7.9499393 |
| Interquartile range (IQR) | 4.5435715 |
Descriptive statistics
| Standard deviation | 2.3676714 |
|---|---|
| Coefficient of variation (CV) | 0.34732199 |
| Kurtosis | -1.2709006 |
| Mean | 6.816935 |
| Median Absolute Deviation (MAD) | 1.9125183 |
| Skewness | 0.41583882 |
| Sum | 340846.75 |
| Variance | 5.6058679 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4.050028419 | 4580 | 9.2% |
| 5.525000191 | 762 | 1.5% |
| 11.99996777 | 531 | 1.1% |
| 8.750052071 | 429 | 0.9% |
| 9.750021172 | 251 | 0.5% |
| 5.240009499 | 157 | 0.3% |
| 5.524318123 | 147 | 0.3% |
| 11.00083675 | 141 | 0.3% |
| 5.57001667 | 141 | 0.3% |
| 5.494989204 | 138 | 0.3% |
| Other values (19975) | 42723 |
| Value | Count | Frequency (%) |
| 4.050028419 | 4580 | |
| 4.051619911 | 1 | < 0.1% |
| 4.052074623 | 1 | < 0.1% |
| 4.052131462 | 1 | < 0.1% |
| 4.052586174 | 3 | < 0.1% |
| 4.052699852 | 3 | < 0.1% |
| 4.053779793 | 1 | < 0.1% |
| 4.053836632 | 1 | < 0.1% |
| 4.05395031 | 2 | < 0.1% |
| 4.054007149 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 11.99996777 | 531 | |
| 11.94478283 | 1 | < 0.1% |
| 11.93324299 | 1 | < 0.1% |
| 11.92660275 | 1 | < 0.1% |
| 11.91719036 | 1 | < 0.1% |
| 11.91712589 | 1 | < 0.1% |
| 11.91706142 | 2 | < 0.1% |
| 11.91196842 | 1 | < 0.1% |
| 11.91022778 | 2 | < 0.1% |
| 11.90784245 | 1 | < 0.1% |
Helix_fraction
Real number (ℝ)
Zeros 
| Distinct | 1156 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.29489187 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 2122 |
| Zeros (%) | 4.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.083333333 |
| Q1 | 0.23636364 |
| median | 0.2962963 |
| Q3 | 0.35185185 |
| 95-th percentile | 0.48484848 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.11548822 |
Descriptive statistics
| Standard deviation | 0.12461176 |
|---|---|
| Coefficient of variation (CV) | 0.42256765 |
| Kurtosis | 6.2809129 |
| Mean | 0.29489187 |
| Median Absolute Deviation (MAD) | 0.057490326 |
| Skewness | 0.92946035 |
| Sum | 14744.593 |
| Variance | 0.015528091 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.3333333333 | 2615 | 5.2% |
| 0 | 2122 | 4.2% |
| 0.25 | 1619 | 3.2% |
| 0.2857142857 | 1057 | 2.1% |
| 0.5 | 1006 | 2.0% |
| 0.2 | 914 | 1.8% |
| 0.3 | 744 | 1.5% |
| 0.4 | 724 | 1.4% |
| 0.2727272727 | 623 | 1.2% |
| 0.375 | 573 | 1.1% |
| Other values (1146) | 38003 |
| Value | Count | Frequency (%) |
| 0 | 2122 | |
| 0.01818181818 | 1 | < 0.1% |
| 0.02 | 1 | < 0.1% |
| 0.02173913043 | 1 | < 0.1% |
| 0.02222222222 | 1 | < 0.1% |
| 0.02325581395 | 1 | < 0.1% |
| 0.0243902439 | 1 | < 0.1% |
| 0.02777777778 | 2 | < 0.1% |
| 0.02941176471 | 5 | < 0.1% |
| 0.0303030303 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 301 | |
| 0.875 | 2 | < 0.1% |
| 0.8571428571 | 3 | < 0.1% |
| 0.8571428571 | 4 | < 0.1% |
| 0.8333333333 | 2 | < 0.1% |
| 0.8333333333 | 2 | < 0.1% |
| 0.8181818182 | 1 | < 0.1% |
| 0.8 | 8 | < 0.1% |
| 0.8 | 1 | < 0.1% |
| 0.7857142857 | 1 | < 0.1% |
Turn_fraction
Real number (ℝ)
Zeros 
| Distinct | 884 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.2061528 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 2961 |
| Zeros (%) | 5.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.14285714 |
| median | 0.2 |
| Q3 | 0.25531915 |
| 95-th percentile | 0.37931034 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.11246201 |
Descriptive statistics
| Standard deviation | 0.11343311 |
|---|---|
| Coefficient of variation (CV) | 0.55023804 |
| Kurtosis | 9.3726165 |
| Mean | 0.2061528 |
| Median Absolute Deviation (MAD) | 0.056410256 |
| Skewness | 1.7154598 |
| Sum | 10307.64 |
| Variance | 0.012867071 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2961 | 5.9% |
| 0.25 | 1657 | 3.3% |
| 0.2 | 1603 | 3.2% |
| 0.1666666667 | 1399 | 2.8% |
| 0.3333333333 | 1173 | 2.3% |
| 0.1428571429 | 1045 | 2.1% |
| 0.2222222222 | 728 | 1.5% |
| 0.1818181818 | 728 | 1.5% |
| 0.125 | 664 | 1.3% |
| 0.2857142857 | 647 | 1.3% |
| Other values (874) | 37395 |
| Value | Count | Frequency (%) |
| 0 | 2961 | |
| 0.01754385965 | 1 | < 0.1% |
| 0.01886792453 | 1 | < 0.1% |
| 0.01960784314 | 1 | < 0.1% |
| 0.02 | 1 | < 0.1% |
| 0.02040816327 | 2 | < 0.1% |
| 0.02127659574 | 4 | < 0.1% |
| 0.02173913043 | 1 | < 0.1% |
| 0.02222222222 | 1 | < 0.1% |
| 0.02380952381 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 180 | |
| 0.8888888889 | 1 | < 0.1% |
| 0.88 | 1 | < 0.1% |
| 0.8666666667 | 2 | < 0.1% |
| 0.8571428571 | 1 | < 0.1% |
| 0.8571428571 | 1 | < 0.1% |
| 0.8333333333 | 2 | < 0.1% |
| 0.8275862069 | 1 | < 0.1% |
| 0.8 | 8 | < 0.1% |
| 0.75 | 32 | 0.1% |
Sheet_fraction
Real number (ℝ)
Zeros 
| Distinct | 1005 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.25831688 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 2307 |
| Zeros (%) | 4.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.055555556 |
| Q1 | 0.1875 |
| median | 0.25 |
| Q3 | 0.32 |
| 95-th percentile | 0.45454545 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.1325 |
Descriptive statistics
| Standard deviation | 0.12661106 |
|---|---|
| Coefficient of variation (CV) | 0.49013854 |
| Kurtosis | 6.2826014 |
| Mean | 0.25831688 |
| Median Absolute Deviation (MAD) | 0.065789474 |
| Skewness | 1.2605728 |
| Sum | 12915.844 |
| Variance | 0.01603036 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2307 | 4.6% |
| 0.25 | 1847 | 3.7% |
| 0.3333333333 | 1803 | 3.6% |
| 0.2 | 1274 | 2.5% |
| 0.2857142857 | 906 | 1.8% |
| 0.1666666667 | 895 | 1.8% |
| 0.5 | 856 | 1.7% |
| 0.2222222222 | 672 | 1.3% |
| 0.1428571429 | 649 | 1.3% |
| 0.2727272727 | 588 | 1.2% |
| Other values (995) | 38203 |
| Value | Count | Frequency (%) |
| 0 | 2307 | |
| 0.01886792453 | 1 | < 0.1% |
| 0.01960784314 | 1 | < 0.1% |
| 0.02040816327 | 1 | < 0.1% |
| 0.025 | 1 | < 0.1% |
| 0.02702702703 | 1 | < 0.1% |
| 0.02941176471 | 1 | < 0.1% |
| 0.0303030303 | 3 | < 0.1% |
| 0.03225806452 | 2 | < 0.1% |
| 0.03333333333 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 272 | |
| 0.8888888889 | 1 | < 0.1% |
| 0.8823529412 | 1 | < 0.1% |
| 0.875 | 2 | < 0.1% |
| 0.8333333333 | 5 | < 0.1% |
| 0.8 | 16 | < 0.1% |
| 0.7777777778 | 1 | < 0.1% |
| 0.7777777778 | 1 | < 0.1% |
| 0.7692307692 | 1 | < 0.1% |
| 0.75 | 51 | 0.1% |
Reduced_coefficient
Real number (ℝ)
High correlation  Zeros 
| Distinct | 78 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4977.758 |
| Minimum | 0 |
|---|---|
| Maximum | 49500 |
| Zeros | 13681 |
| Zeros (%) | 27.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2980 |
| Q3 | 7450 |
| 95-th percentile | 15470 |
| Maximum | 49500 |
| Range | 49500 |
| Interquartile range (IQR) | 7450 |
Descriptive statistics
| Standard deviation | 5519.6204 |
|---|---|
| Coefficient of variation (CV) | 1.1088567 |
| Kurtosis | 2.8873211 |
| Mean | 4977.758 |
| Median Absolute Deviation (MAD) | 2980 |
| Skewness | 1.5171578 |
| Sum | 2.488879 × 108 |
| Variance | 30466209 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 13681 | |
| 1490 | 8144 | |
| 2980 | 5037 | 10.1% |
| 6990 | 3129 | 6.3% |
| 5500 | 2871 | 5.7% |
| 4470 | 2818 | 5.6% |
| 8480 | 2476 | 5.0% |
| 9970 | 1624 | 3.2% |
| 5960 | 1557 | 3.1% |
| 12490 | 1073 | 2.1% |
| Other values (68) | 7590 |
| Value | Count | Frequency (%) |
| 0 | 13681 | |
| 1490 | 8144 | |
| 2980 | 5037 | 10.1% |
| 4470 | 2818 | 5.6% |
| 5500 | 2871 | 5.7% |
| 5960 | 1557 | 3.1% |
| 6990 | 3129 | 6.3% |
| 7450 | 708 | 1.4% |
| 8480 | 2476 | 5.0% |
| 8940 | 279 | 0.6% |
| Value | Count | Frequency (%) |
| 49500 | 1 | < 0.1% |
| 46980 | 1 | < 0.1% |
| 45490 | 3 | |
| 44000 | 1 | < 0.1% |
| 41940 | 1 | < 0.1% |
| 41480 | 1 | < 0.1% |
| 40450 | 2 | |
| 39990 | 3 | |
| 38960 | 3 | |
| 38500 | 1 | < 0.1% |
Oxidized_coefficient
Real number (ℝ)
High correlation  Zeros 
| Distinct | 217 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4996.358 |
| Minimum | 0 |
|---|---|
| Maximum | 49500 |
| Zeros | 13180 |
| Zeros (%) | 26.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2980 |
| Q3 | 7450 |
| 95-th percentile | 15720 |
| Maximum | 49500 |
| Range | 49500 |
| Interquartile range (IQR) | 7450 |
Descriptive statistics
| Standard deviation | 5529.528 |
|---|---|
| Coefficient of variation (CV) | 1.1067117 |
| Kurtosis | 2.8697698 |
| Mean | 4996.358 |
| Median Absolute Deviation (MAD) | 2980 |
| Skewness | 1.5137876 |
| Sum | 2.498179 × 108 |
| Variance | 30575680 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 13180 | |
| 1490 | 7491 | |
| 2980 | 4413 | 8.8% |
| 6990 | 2645 | 5.3% |
| 5500 | 2581 | 5.2% |
| 4470 | 2382 | 4.8% |
| 8480 | 2071 | 4.1% |
| 9970 | 1324 | 2.6% |
| 5960 | 1280 | 2.6% |
| 12490 | 914 | 1.8% |
| Other values (207) | 11719 |
| Value | Count | Frequency (%) |
| 0 | 13180 | |
| 125 | 416 | 0.8% |
| 250 | 75 | 0.1% |
| 375 | 8 | < 0.1% |
| 500 | 2 | < 0.1% |
| 1490 | 7491 | |
| 1615 | 537 | 1.1% |
| 1740 | 93 | 0.2% |
| 1865 | 17 | < 0.1% |
| 1990 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 49500 | 1 | < 0.1% |
| 46980 | 1 | < 0.1% |
| 45490 | 3 | |
| 44000 | 1 | < 0.1% |
| 41940 | 1 | < 0.1% |
| 41480 | 1 | < 0.1% |
| 40450 | 2 | |
| 40115 | 1 | < 0.1% |
| 39990 | 2 | |
| 38960 | 3 |
Phage_source
Categorical
High correlation 
| Distinct | 14 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.9 MiB |
| IMG_VR | |
|---|---|
| MGV | |
| GPD | |
| GOV2 | |
| TemPhD | |
| Other values (9) |
Length
| Max length | 8 |
|---|---|
| Median length | 7 |
| Mean length | 4.349 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | RefSeq |
|---|---|
| 2nd row | RefSeq |
| 3rd row | RefSeq |
| 4th row | RefSeq |
| 5th row | RefSeq |
Common Values
| Value | Count | Frequency (%) |
| IMG_VR | 13979 | |
| MGV | 12188 | |
| GPD | 8874 | |
| GOV2 | 6028 | |
| TemPhD | 4021 | 8.0% |
| CHVD | 2273 | 4.5% |
| GVD | 840 | 1.7% |
| RefSeq | 586 | 1.2% |
| PhagesDB | 393 | 0.8% |
| IGVD | 368 | 0.7% |
| Other values (4) | 450 | 0.9% |
Length
| Value | Count | Frequency (%) |
| img_vr | 13979 | |
| mgv | 12188 | |
| gpd | 8874 | |
| gov2 | 6028 | |
| temphd | 4021 | 8.0% |
| chvd | 2273 | 4.5% |
| gvd | 840 | 1.7% |
| refseq | 586 | 1.2% |
| phagesdb | 393 | 0.8% |
| igvd | 368 | 0.7% |
| Other values (4) | 450 | 0.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| G | 42533 | |
| V | 35836 | |
| M | 26182 | |
| D | 16807 | 7.7% |
| R | 14565 | 6.7% |
| I | 14347 | 6.6% |
| _ | 13979 | 6.4% |
| P | 13288 | 6.1% |
| O | 6028 | 2.8% |
| 2 | 6028 | 2.8% |
| Other values (19) | 27857 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 217450 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| G | 42533 | |
| V | 35836 | |
| M | 26182 | |
| D | 16807 | 7.7% |
| R | 14565 | 6.7% |
| I | 14347 | 6.6% |
| _ | 13979 | 6.4% |
| P | 13288 | 6.1% |
| O | 6028 | 2.8% |
| 2 | 6028 | 2.8% |
| Other values (19) | 27857 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 217450 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| G | 42533 | |
| V | 35836 | |
| M | 26182 | |
| D | 16807 | 7.7% |
| R | 14565 | 6.7% |
| I | 14347 | 6.6% |
| _ | 13979 | 6.4% |
| P | 13288 | 6.1% |
| O | 6028 | 2.8% |
| 2 | 6028 | 2.8% |
| Other values (19) | 27857 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 217450 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| G | 42533 | |
| V | 35836 | |
| M | 26182 | |
| D | 16807 | 7.7% |
| R | 14565 | 6.7% |
| I | 14347 | 6.6% |
| _ | 13979 | 6.4% |
| P | 13288 | 6.1% |
| O | 6028 | 2.8% |
| 2 | 6028 | 2.8% |
| Other values (19) | 27857 |
Function_Prediction_source
Categorical
High correlation  Missing 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 27192 |
| Missing (%) | 54.4% |
| Memory size | 2.8 MiB |
| - | |
|---|---|
| eggNOG-mapper | |
| Iterative search |
Length
| Max length | 16 |
|---|---|
| Median length | 1 |
| Mean length | 6.6196072 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | eggNOG-mapper |
|---|---|
| 2nd row | eggNOG-mapper |
| 3rd row | eggNOG-mapper |
| 4th row | eggNOG-mapper |
| 5th row | eggNOG-mapper |
Common Values
| Value | Count | Frequency (%) |
| - | 12530 | |
| eggNOG-mapper | 8666 | 17.3% |
| Iterative search | 1612 | 3.2% |
| (Missing) | 27192 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 12530 | ||
| eggnog-mapper | 8666 | |
| iterative | 1612 | 6.6% |
| search | 1612 | 6.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 22168 | |
| - | 21196 | |
| g | 17332 | |
| p | 17332 | |
| a | 11890 | |
| r | 11890 | |
| G | 8666 | 5.7% |
| O | 8666 | 5.7% |
| N | 8666 | 5.7% |
| m | 8666 | 5.7% |
| Other values (8) | 14508 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 150980 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 22168 | |
| - | 21196 | |
| g | 17332 | |
| p | 17332 | |
| a | 11890 | |
| r | 11890 | |
| G | 8666 | 5.7% |
| O | 8666 | 5.7% |
| N | 8666 | 5.7% |
| m | 8666 | 5.7% |
| Other values (8) | 14508 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 150980 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 22168 | |
| - | 21196 | |
| g | 17332 | |
| p | 17332 | |
| a | 11890 | |
| r | 11890 | |
| G | 8666 | 5.7% |
| O | 8666 | 5.7% |
| N | 8666 | 5.7% |
| m | 8666 | 5.7% |
| Other values (8) | 14508 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 150980 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 22168 | |
| - | 21196 | |
| g | 17332 | |
| p | 17332 | |
| a | 11890 | |
| r | 11890 | |
| G | 8666 | 5.7% |
| O | 8666 | 5.7% |
| N | 8666 | 5.7% |
| m | 8666 | 5.7% |
| Other values (8) | 14508 |
Interactions
Correlations
| Aromaticity | Function_Prediction_source | Function_prediction_source | Helix_fraction | Instability_index | Isoelectric_point | Molecular_weight | Oxidized_coefficient | Phage_source | Protein_source | Reduced_coefficient | Sheet_fraction | Start | Stop | Strand | Turn_fraction | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Aromaticity | 1.000 | 0.000 | 0.016 | 0.454 | -0.015 | -0.009 | 0.206 | 0.596 | 0.018 | 0.000 | 0.600 | -0.237 | 0.038 | 0.038 | 0.005 | -0.035 |
| Function_Prediction_source | 0.000 | 1.000 | 0.000 | 0.036 | 0.015 | 0.035 | 0.039 | 0.006 | 0.324 | 1.000 | 0.005 | 0.050 | 0.067 | 0.066 | 0.000 | 0.063 |
| Function_prediction_source | 0.016 | 0.000 | 1.000 | 0.041 | 0.000 | 0.019 | 0.023 | 0.000 | 0.823 | 1.000 | 0.000 | 0.008 | 0.080 | 0.079 | 0.053 | 0.030 |
| Helix_fraction | 0.454 | 0.036 | 0.041 | 1.000 | -0.142 | -0.059 | 0.063 | 0.238 | 0.019 | 0.011 | 0.243 | -0.065 | 0.049 | 0.048 | 0.012 | -0.195 |
| Instability_index | -0.015 | 0.015 | 0.000 | -0.142 | 1.000 | -0.029 | 0.169 | 0.085 | 0.009 | 0.000 | 0.081 | 0.146 | -0.003 | -0.004 | 0.004 | 0.021 |
| Isoelectric_point | -0.009 | 0.035 | 0.019 | -0.059 | -0.029 | 1.000 | 0.056 | 0.031 | 0.022 | 0.000 | 0.032 | -0.276 | -0.015 | -0.015 | 0.000 | 0.001 |
| Molecular_weight | 0.206 | 0.039 | 0.023 | 0.063 | 0.169 | 0.056 | 1.000 | 0.627 | 0.011 | 0.001 | 0.620 | 0.037 | 0.001 | -0.001 | 0.000 | 0.019 |
| Oxidized_coefficient | 0.596 | 0.006 | 0.000 | 0.238 | 0.085 | 0.031 | 0.627 | 1.000 | 0.006 | 0.000 | 0.998 | -0.116 | 0.014 | 0.012 | 0.000 | 0.008 |
| Phage_source | 0.018 | 0.324 | 0.823 | 0.019 | 0.009 | 0.022 | 0.011 | 0.006 | 1.000 | 1.000 | 0.006 | 0.022 | 0.069 | 0.069 | 0.054 | 0.020 |
| Protein_source | 0.000 | 1.000 | 1.000 | 0.011 | 0.000 | 0.000 | 0.001 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 0.058 | 0.058 | 0.040 | 0.007 |
| Reduced_coefficient | 0.600 | 0.005 | 0.000 | 0.243 | 0.081 | 0.032 | 0.620 | 0.998 | 0.006 | 0.000 | 1.000 | -0.113 | 0.014 | 0.012 | 0.000 | 0.007 |
| Sheet_fraction | -0.237 | 0.050 | 0.008 | -0.065 | 0.146 | -0.276 | 0.037 | -0.116 | 0.022 | 0.000 | -0.113 | 1.000 | -0.022 | -0.025 | 0.000 | -0.325 |
| Start | 0.038 | 0.067 | 0.080 | 0.049 | -0.003 | -0.015 | 0.001 | 0.014 | 0.069 | 0.058 | 0.014 | -0.022 | 1.000 | 0.999 | 0.000 | -0.020 |
| Stop | 0.038 | 0.066 | 0.079 | 0.048 | -0.004 | -0.015 | -0.001 | 0.012 | 0.069 | 0.058 | 0.012 | -0.025 | 0.999 | 1.000 | 0.000 | -0.015 |
| Strand | 0.005 | 0.000 | 0.053 | 0.012 | 0.004 | 0.000 | 0.000 | 0.000 | 0.054 | 0.040 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.006 |
| Turn_fraction | -0.035 | 0.063 | 0.030 | -0.195 | 0.021 | 0.001 | 0.019 | 0.008 | 0.020 | 0.007 | 0.007 | -0.325 | -0.020 | -0.015 | 0.006 | 1.000 |
Missing values
Sample
| Phage_ID | Protein_source | Function_prediction_source | Start | Stop | Strand | Protein_ID | Product | Protein_classification | Molecular_weight | Aromaticity | Instability_index | Isoelectric_point | Helix_fraction | Turn_fraction | Sheet_fraction | Reduced_coefficient | Oxidized_coefficient | Phage_source | Function_Prediction_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NC_013650.1 | RefSeq | RefSeq | 64948 | 65748 | + | YP_003347791.1 | hypothetical protein | hypothetical; | 6435.9765 | 0.107143 | 35.989286 | 5.868706 | 0.250000 | 0.178571 | 0.178571 | 11460 | 11460 | RefSeq | NaN |
| 1 | NC_021349.1 | RefSeq | RefSeq | 81651 | 81983 | + | YP_008061629.1 | HNH endonuclease | packaging; | 4363.8986 | 0.075000 | 61.070000 | 8.070620 | 0.275000 | 0.300000 | 0.175000 | 11000 | 11125 | RefSeq | NaN |
| 2 | NC_010392.1 | RefSeq | RefSeq | 12850 | 13449 | - | YP_001700595.1 | bacteriophage tail tip assembly protein%3B Lambda gpK homolog | assembly;infection; | 6944.7367 | 0.118644 | 39.355932 | 9.416139 | 0.237288 | 0.152542 | 0.237288 | 19480 | 19605 | RefSeq | NaN |
| 3 | NC_021071.1 | RefSeq | RefSeq | 106486 | 106767 | - | YP_007877716.1 | hypothetical protein | hypothetical; | 2866.2480 | 0.173913 | 111.039130 | 5.274624 | 0.434783 | 0.086957 | 0.260870 | 4470 | 4470 | RefSeq | NaN |
| 4 | NC_019510.1 | RefSeq | RefSeq | 10478 | 12196 | + | YP_007005441.1 | DNA primase/helicase | replication; | 1412.3706 | 0.166667 | 38.191667 | 4.050028 | 0.166667 | 0.166667 | 0.166667 | 5500 | 5500 | RefSeq | NaN |
| 5 | NC_020857.1 | RefSeq | RefSeq | 24889 | 25374 | + | YP_007675378.1 | hypothetical protein | hypothetical; | 2274.4383 | 0.047619 | 18.695238 | 5.663801 | 0.190476 | 0.142857 | 0.047619 | 0 | 0 | RefSeq | NaN |
| 6 | NC_019519.1 | RefSeq | RefSeq | 30637 | 31104 | + | YP_007006505.1 | hypothetical protein | hypothetical; | 1923.2424 | 0.133333 | 174.693333 | 10.834379 | 0.333333 | 0.133333 | 0.333333 | 5500 | 5500 | RefSeq | NaN |
| 7 | NC_005083.2 | RefSeq | RefSeq | 61372 | 61992 | + | NP_899350.1 | hypothetical protein | hypothetical; | 7769.5234 | 0.151515 | 47.701515 | 4.846286 | 0.348485 | 0.212121 | 0.242424 | 22460 | 22585 | RefSeq | NaN |
| 8 | NC_004589.1 | RefSeq | RefSeq | 5022 | 5225 | + | NP_795671.1 | hypothetical protein | hypothetical; | 7866.6832 | 0.134328 | 52.959701 | 4.093794 | 0.402985 | 0.194030 | 0.343284 | 15930 | 15930 | RefSeq | NaN |
| 9 | NC_011811.1 | RefSeq | RefSeq | 12804 | 12938 | + | YP_002456045.1 | hypothetical protein | hypothetical; | 4856.8811 | 0.068182 | 39.104545 | 9.182957 | 0.431818 | 0.136364 | 0.363636 | 0 | 0 | RefSeq | NaN |
| Phage_ID | Protein_source | Function_prediction_source | Start | Stop | Strand | Protein_ID | Product | Protein_classification | Molecular_weight | Aromaticity | Instability_index | Isoelectric_point | Helix_fraction | Turn_fraction | Sheet_fraction | Reduced_coefficient | Oxidized_coefficient | Phage_source | Function_Prediction_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 49990 | biochar_2976 | prodigal | NaN | 5288 | 5470 | + | biochar_2976_10 | unknown | unsorted; | 6655.6389 | 0.050000 | 58.293333 | 4.050028 | 0.350000 | 0.233333 | 0.416667 | 6990 | 6990 | STV | - |
| 49991 | biochar_953 | prodigal | NaN | 35109 | 35303 | + | biochar_953_49 | unknown | unsorted; | 7437.2190 | 0.093750 | 44.607813 | 4.425791 | 0.234375 | 0.218750 | 0.250000 | 22000 | 22125 | STV | - |
| 49992 | biochar_1080 | prodigal | NaN | 23883 | 24353 | + | biochar_1080_38 | virion structural protein | assembly;infection; | 1650.7364 | 0.000000 | 72.950000 | 4.050028 | 0.312500 | 0.312500 | 0.250000 | 0 | 0 | STV | Iterative search |
| 49993 | biochar_2262 | prodigal | NaN | 5373 | 7160 | + | biochar_2262_4 | hypothetical protein | hypothetical; | 3901.3247 | 0.085714 | 61.511429 | 4.860382 | 0.257143 | 0.314286 | 0.257143 | 1490 | 1490 | STV | eggNOG-mapper |
| 49994 | biochar_4542 | prodigal | NaN | 9883 | 10386 | - | biochar_4542_21 | unknown | unsorted; | 2939.4031 | 0.111111 | 22.537037 | 4.050028 | 0.407407 | 0.296296 | 0.296296 | 5500 | 5500 | STV | - |
| 49995 | biochar_1665 | prodigal | NaN | 23018 | 23911 | + | biochar_1665_32 | glycosyl transferase family 8 | immune; | 2185.4872 | 0.117647 | 60.864706 | 9.821001 | 0.294118 | 0.058824 | 0.058824 | 1490 | 1490 | STV | Iterative search |
| 49996 | biochar_1081 | prodigal | NaN | 499 | 1506 | - | biochar_1081_2 | Part of the outer membrane protein assembly complex, which is involved in assembly and insertion of beta-barrel proteins into the outer membrane | integration;assembly; | 5763.4226 | 0.090909 | 37.052727 | 4.245270 | 0.254545 | 0.309091 | 0.254545 | 8480 | 8480 | STV | eggNOG-mapper |
| 49997 | biochar_2772 | prodigal | NaN | 13710 | 13892 | - | biochar_2772_19 | unknown | unsorted; | 6793.7002 | 0.066667 | 71.563333 | 10.509265 | 0.166667 | 0.283333 | 0.116667 | 6990 | 7115 | STV | - |
| 49998 | biochar_5776 | prodigal | NaN | 6680 | 7210 | - | biochar_5776_16 | unknown | unsorted; | 3992.5418 | 0.111111 | 72.466667 | 11.001481 | 0.222222 | 0.250000 | 0.250000 | 6990 | 6990 | STV | - |
| 49999 | biochar_1198 | prodigal | NaN | 15560 | 16285 | + | biochar_1198_26 | unknown | unsorted; | 3391.8735 | 0.032258 | 74.732258 | 5.967549 | 0.161290 | 0.290323 | 0.387097 | 5500 | 5500 | STV | - |